NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Reconstruction of the human amylase locus reveals ancient duplications seeding modern-day variation

https://doi.org/10.1126/science.adn0609

Yilmaz, Feyza; Karageorgiou, Charikleia; Kim, Kwondo; Pajic, Petar; Scheer, Kendra; Beck, Christine R; Torregrossa, Ann-Marie; Lee, Charles; Gokcumen, Omer; Audano, Peter A; et al (November 2024, Science)

Previous studies suggested that the copy number of the human salivary amylase gene,AMY1, correlates with starch-rich diets. However, evolutionary analyses are hampered by the absence of accurate, sequence-resolved haplotype variation maps. We identified 30 structurally distinct haplotypes at nucleotide resolution among 98 present-day humans, revealing that the coding sequences ofAMY1copies are evolving under negative selection. Genomic analyses of these haplotypes in archaic hominins and ancient human genomes suggest that a common three-copy haplotype, dating as far back as 800,000 years ago, has seeded rapidly evolving rearrangements through recurrent nonallelic homologous recombination. Additionally, haplotypes with more than threeAMY1copies have significantly increased in frequency among European farmers over the past 4000 years, potentially as an adaptive response to increased starch digestion.
more » « less
Full Text Available
Chromosome-length genome assembly and karyotype of the endangered black-footed ferret ( Mustela nigripes )

https://doi.org/10.1093/jhered/esad035

Kliver, Sergei; Houck, Marlys L; Perelman, Polina L; Totikov, Azamat; Tomarovsky, Andrey; Dudchenko, Olga; Omer, Arina D; Colaric, Zane; Weisz, David; Aiden, Erez Lieberman; et al (May 2023, Journal of Heredity)
Oleksyk, Taras (Ed.)
Abstract The black-footed ferret (Mustela nigripes) narrowly avoided extinction to become an oft-cited example of the benefits of intensive management, research, and collaboration to save a species through ex situ conservation breeding and reintroduction into its former range. However, the species remains at risk due to possible inbreeding, disease susceptibility, and multiple fertility challenges. Here, we report the de novo genome assembly of a male black-footed ferret generated through a combination of linked-read sequencing, optical mapping, and Hi-C proximity ligation. In addition, we report the karyotype for this species, which was used to anchor and assign chromosome numbers to the chromosome-length scaffolds. The draft assembly was ~2.5 Gb in length, with 95.6% of it anchored to 19 chromosome-length scaffolds, corresponding to the 2n = 38 chromosomes revealed by the karyotype. The assembly has contig and scaffold N50 values of 148.8 kbp and 145.4 Mbp, respectively, and is up to 96% complete based on BUSCO analyses. Annotation of the assembly, including evidence from RNA-seq data, identified 21,406 protein-coding genes and a repeat content of 37.35%. Phylogenomic analyses indicated that the black-footed ferret diverged from the European polecat/domestic ferret lineage 1.6 million yr ago. This assembly will enable research on the conservation genomics of black-footed ferrets and thereby aid in the further restoration of this endangered species.
more » « less
Full Text Available
Semi-automated assembly of high-quality diploid human reference genomes

https://doi.org/10.1038/s41586-022-05325-5

Jarvis, Erich D.; Formenti, Giulio; Rhie, Arang; Guarracino, Andrea; Yang, Chentao; Wood, Jonathan; Tracey, Alan; Thibaud-Nissen, Francoise; Vollger, Mitchell R.; Porubsky, David; et al (November 2022, Nature)

Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals 3,4 . Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome 5 . To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity 6 . Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
more » « less
Full Text Available
Towards complete and error-free genome assemblies of all vertebrate species

https://doi.org/10.1038/s41586-021-03451-0

Rhie, Arang; McCarthy, Shane A.; Fedrigo, Olivier; Damas, Joana; Formenti, Giulio; Koren, Sergey; Uliano-Silva, Marcela; Chow, William; Fungtammasan, Arkarachai; Kim, Juwan; et al (April 2021, Nature)
null (Ed.)
Abstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species 1–4 . To address this issue, the international Genome 10K (G10K) consortium 5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
more » « less
Full Text Available

Search for: All records